Corpus-Based Chinese-Korean Abstracting Translation System
نویسندگان
چکیده
A Corpus-Based Chinese-Korean Abstracting Translation System is designed and imple mented. Firstly, a text indexing method called Natural Hierarchical Network(NHN) is intro duced, and then a Corpus-Based Word Seg mentation algorithm is developed with the segmentation correctness of 98% for open test. Based on a words weighting function and a sentence importance weighting function which can dynamically calculate the importance of words and sentences by using the word fre quency both in corpus and context, word length, sentence length and so on, an abstract ing system is implemented to produce ab stracts of texts in deferent languages and do mains by any abstracting rate. Experiments show that generally abstracts produced by 10% to 20% abstracting rates can cover 90% of the important sentences of the input texts. Finally, combines with an Example-Based Chinese-Korean Machine Translation System, the generated abstracts are translated into tar get language with the correctness of transla tion of more than 70% by the important words oriented machine translation strategy.
منابع مشابه
A Chinese POS Decision Method Using Korean Translation Information
In this paper we propose a method that imitates a translation expert using the Korean translation information and analyse the performance. Korean is good at tagging than Chinese, so we can use this property in Chinese POS tagging. Keyword : machine translation, part of speech tagging, corpus Introduction Previous POS(Part Of Speech) tagging methods of Chinese can be largely classified into 2. O...
متن کاملThe Use of Second-Person Reference in Advertisement Translation with Reference to Translation between Chinese and English
This research aimed to review the use of second-person reference in advertisement translation, work out the general rules, and provide guidance to translators. Using second-person reference is common in the advertising discourse. Addressing audiences directly involves their attention and in this way enhances their memorization of the advertised message. Second-person reference can be realized v...
متن کاملKorean-Chinese-Japanese Multilingual Wordnet with Shared Semantic Hierarchy
A Chinese-Japanese-Korean wordnet is introduced. It is constructed based on a shared semantic hierarchy that is originated from NTT Goidaikei (Lexical Hierarchical System). Korean wordnet was constructed through the semantic category assignment to every sense of Korean words in a dictionary. Verbs and adjectives’ senses are assigned to the same semantic hierarchy as that of nouns. Each sense of...
متن کاملBayesian Learning of Tokenization for Machine Translation
Training a statistical machine translation system starts with tokenizing a parallel corpus. Some languages such as Chinese do not incorporate spacing in their writing system, which creates a challenge for tokenization. Morphologically rich languages such as Korean and Hungarian present an even bigger challenge, since optimal token boundaries for machine translation in these languages are often ...
متن کاملUnsupervised Tokenization for Machine Translation
Training a statistical machine translation starts with tokenizing a parallel corpus. Some languages such as Chinese do not incorporate spacing in their writing system, which creates a challenge for tokenization. Moreover, morphologically rich languages such as Korean present an even bigger challenge, since optimal token boundaries for machine translation in these languages are often unclear. Bo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997